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I NTRODUCT I ON TO LEARN I NG MACH I NES 


By Jack G. Sheppard 
Manned Spacecraft Center 

SUMMARY 


Learning machines can perform routine intellectual tasks that are normally 
the exclusive domain of human beings. Learning machines attempt to recognize and 
to classify patterns and frequently are called adaptive pattern recognizers. Pattern 
recognition is presently an active field. Several hundred technical reports and arti- 
cles on the subject are published annually on such diverse subjects as mechanisms of 
human vision, military target recognition, adaptive- control systems, and optimum 
communication receivers. 

Learning machines, which operate on patterns, use a priori information or some 
inherent pattern- class characteristic to decide the category in which a pattern belongs. 
The key characteristic of learning machines is the ability to recognize general rela- 
tionships from a limited set of observations. Two principal learning- machine struc- 
tures are used. One structure assumes the separability of all categories and derives 
exact decision procedures. The other structure uses a probabilistic approach and at- 
tempts to make optimum decisions, always recognizing the possibility of error. 

The majority of learning- machine research has emphasized machine structure; 
application to specific problems has been limited. The structural theory is now devel- 
oped sufficiently to provide the tools for investigators to apply in various individual 
fields. 


INTRODUCTION 


For many years, scientists have been intrigued by the possibility of machines 
that can replace people in routine intellectual activities. A class of machines that are 
called learning machines or adaptive pattern classifiers now can perform some rou- 
tine tasks previously performed only by human beings. These tasks include weather 
prediction, handwriting analysis, speech analysis, target recognition, and medical di- 
agnosis. In many tasks, such as weather prediction, machines are superior to human 
beings in both speed and accuracy. In other tasks, such as speech recognition, the 
problems are so complex that the results are incomplete. 

Many intellectual processes and methods of mechanization have been investigated, 
but most of these approaches were not feasible until the advent of high-speed digital 
computers. Consequently, most of the significant work on learning machines has been 



done within the last 15 years. As late as 1960, exact matching and correlation with 
stored references were the most sophisticated techniques in use. Since 1960, the con- 
cept of representing inputs as vectors in an n- dimensional space has gained acceptance. 
Based on this concept, the learning -machine theory has progressed rapidly. Presented 
in this technical note are a discussion of the two basic approaches to pattern recogni- 
tion, the theory underlying each approach, and a computer-program-implementation 
example of one of the important learning procedures of the nonparametric technique. 


SYMBOLS 


A square matrix 

a element of A 

B column vector 

b element of B 

C set of all pattern classification categories 

C. ith classification category 

c element of C 

c’ element of C 

D scalar 

F vector composed of functions of X 

f element of F 

g. (X) ith discriminant function of X 

g!(X) the logarithm of g^(X) 

L (i) conditional average loss 

X 

L' (i) modified conditional average loss 

In natural logarithm 

]VL mean vector of the ith category 

p(j/x) density on j given that x has occurred 
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p(x) density on X 

R n- dimensional vector space defined by the set of all X; R is called the 
pattern space 

R^ set of all patterns that should be classified into the category C 

RXC Cartesian product of the sets R and C 
r element of R 

W set of n + 1 numbers collectively called a weight vector 
W' adjusted weight vector 

w. ith component of a weight vector 

X set of n measurements (x^,X 2 , • . • ,x n ) collectively called a pattern 

X^ row vector formed by the transposition of the column vector X 

XeR. X is an element of the set R. 

1 l 

x^ ith component of X 

Y augmented pattern vector 

6.. Dirac Delta Function 

i] 

X (i/j) loss incurred in selecting i when j actually has occurred 
2 covariance matrix of a multivariate Gaussian density 

2~* inverse of the matrix 2 

O’ element of 2 

u. ith component of 2 

Subscripts: 

k number of categories into which patterns are to be classified 

M dimension of <p space 
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Superscript: 

t a matrix transposed 


HISTORY 


From earliest times, man has been interested in the mechanics of intellectual 
processes (ref. 1). Approximately 320 B.C., Aristotle formulated his logic with the 
purpose of systematizing rules of thought. Francis Bacon (1561 to 1626) sought to es- 
tablish rules for the systematic acquisition of knowledge. Thomas Hobbes (1588 to 
1679) was concerned with thought processes; that is, the way the mind progresses in 
an orderly and directed way from one thought to the next thought. John Locke (1632 to 
1704) dealt with perception, which he considered to be "the inlet of all materials of 
knowledge. " Locke was one of the first to document the relationship between repeti- 
tive perception and memory enhancement (ref. 2). George Boole exposed the intellec- 
tual process of generalization through which an observer draws conclusions that are 
greater and more comprehensive than the observations on which the conclusions are 
based (ref. 3). This concept, generalization from limited observation, distinguishes 
a learning machine from an ordinary process controller or computer. 

The following men were pioneers in the development of learning machines. 
Rosenblatt (ref. 4) contributed the Perceptron, a two-dimensional array of optical sen- 
sors that is the basis of most image processors. Highleyman (ref. 5) contributed sig- 
nificantly to linear -machine theory. Braverman (ref. 6) applied the Bayesian decision 
theory, which led to the current work with parametric machines, and extended his work 
in collaboration with Abramson (refs. 7 and 8). Sebestyen (ref. 9) demonstrated the 
value of transformation on the pattern space for feature enhancement. Fralick (ref. 10) 
has shown that optimum solutions in a machine of fixed size can be obtained without a 
teacher. 


PATTERNS 


Learning machines operate on patterns. (Much of this discussion is based 
on the work of N. J. Nilsson, ref. 11.) A pattern is a set of measurements 
X = (x^Xgj . . . ,x n ) that represents some phenomenon of interest. Such a set can be 

considered as a vector in an n-dimensional vector space. If the vector space ac- 
curately represents the phenomenon of interest, each vector in the space represents 
some state of the source phenomenon. If rules can be derived that allow nonambiguous 
mappings from the space to a set of outcomes, a machine can accomplish these map- 
pings, and an automatic pattern classifier can be derived. Thus, learning machines 
involve two basic problems. One problem is the selection of the components of the 
pattern so that the pattern space accurately represents the phenomenon of interest. 
The other problem is the determination of the rules that allow nonambiguous mappings 
from the pattern space to a set of outcomes or classifications. 
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Selecting Pattern Components 

The process of selecting the components of a pattern (ref. 12) involves the cus- 
tomary conflict of interest; that is, the balance between measuring enough things and 
the expense of making the measurements. The cost of processing, which is propor- 
tional to the number of components of the pattern, further augments this expense. A 
compromise must be derived that does not sacrifice critical information and yet is ec- 
onomical. The form of the pattern is another concern. To simplify the classification 
problem, the selected components should emphasize the differences among the patterns 
of the categories. Sometimes, as in target recognition, a choice is not available. In 
these cases, transformations on the pattern space may emphasize the correct charac- 
teristics and lead to simplified processing. This process of selecting and transforming 
the correct components of the pattern is intuitive and is not amenable to general treat- 
ment. Therefore, this area was not of prime interest to early investigators. However, 
the process is essential to any use of pattern- recognition techniques and will be a prin- 
cipal area of concern to anyone who attempts to use a learning machine on a particular 
problem. 


Classifying Patterns 

After the components of the pattern are selected, a phase that is more amenable 
to general treatment is reached. Two methods are most common. One method is 
based on the theory of finite dimensional vector spaces, and the other method is based 
on statistical decision theory. In the first case (nonparametric), an attempt is made 
to construct decision regions in pattern space by the use of hyperplanes, hyperspheres, 
and other surfaces as boundaries between regions. The problem is the selection of the 
correct surface types and the correct locations for these surfaces. In the other case 
(parametric), some statistical distribution is assumed to be a function of parameters, 
and optimal decision rules are used to make classifications. In either case, informa- 
tion is usually inadequate to complete the classifier, and a training technique is used 
to finish the job. These training techniques usually include presenting a series of pre- 
viously classified patterns to the machine and correcting the response until the machine 
"learns" to make correct classifications. 

DISCRIMINANT FUNCTIONS 


In both parametric and nonparametric cases, the concept of function is useful in 
the general problem of mappings from the pattern space. The following definition will 
provide the basis for this discussion of discriminant functions. 

A function from a set R to a set C is a set g of ordered pairs in RxC 
with the property that if (r, c) and (r, c') are elements of g, then (c = c'). 

Thus, given a pattern space R and a set C of categories into which the patterns are 
to be classified, a function maps each element of the pattern space into one and only 
one of the classification categories. The problem devolves into selecting the proper 
function so that the outcomes have some meaning. 
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Assume that each point in R belongs to one of the categories C C ^ } . . . , C^. 
Then R can be divided into the subsets R^, Rg, . . . , R^ according to the category of 
classification. Consider a set of scalar functions of the vector XeR, [g^X), g 2 (X), • • • , 
g, (X)] chosen so that for all XeR. 

K X 

g.(X) > g (X) for i, j = 1,2, . . .,k; i * j (1) 

Such functions are called discriminant functions. If such a set of functions can be 
found, and the outputs feed a maximum detector, the set will classify properly any vec- 
tor XeR (fig. 1). 


X 1 


x 2 


x 


n 






Ci 


Figure 1. - Linear machine. 


LINEAR FUNCTIONS 


The simplest form of a discriminant function is 


g.(X) - WjXj + w 2 x 2 + 


+ w x 
n n 


+ w 


n+1 


( 2 ) 


This form is linear in the components of X and is called a linear discriminant func- 
tion. Linear discriminant functions describe hyperplanes in n- space. Such functions 
are especially useful when the pattern space is to be dichotomized (fig. 2). Any pat- 
tern space that can be divided correctly by linear functions is called linearly separable. 
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Figure 2. - Linear diehotomizer. 


QUADRIC FUNCTIONS 


Consider the equations 


g(X) = X*AX + X^ + D (3) 

_a n a i2i x ii r b i 

g(X) = [XjXg] + pjX 2 J + D (4) 

a 21 a 22 x 2_ b 2 

g(X) = atf! + ( a 12 + a 2l) x l x 2 + a 22 X \ + b l X l + b 2 X 2 + D (5) 


This ’’second-degree equation” specializes to the ellipse and hyperbola. In gen- 
eralized n- space, equation (3) is called a quadric discriminant function. Depending on 

whether the associated quadratic form (X^AX) is positive definite, positive semidefi- 
nite, or nondefinite, a quadric function specializes to a hyperellipsoid, a hyperellip- 
soidal cylinder, or a hyperhyperboloid. The orientations of these surfaces are 
controlled by the eigenvectors of A. Because of the added flexibility of form, the 
quadric function is a more powerful tool than the hyperplane. 
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THE 0 FUNCTIONS 


In expanded form, equation (3) is 


n 2 n- 1 n 


n 


«<* =5>ii X i + E E W ik x i X k + E W i + W n+1 

i-1 i=l k=i+l i=l 


( 6 ) 


Let F be a vector composed of functions of X 


(F = f^fg,...,! 


M, 


( 7 ) 


Let the first n of these functions be 


2 2 2 
X J ) Xg ) • • • > x R , 


(8) 


the second n(n - l)/2 functions be 


(• • • ’ x l x 2’ X 1 X 3’ ’ ‘ ’ ’ x n- l X n’ * ’ ") 


( 9 ) 


and the last n functions be 




(io) 


Equations (7) to (10) are the basis for figure 3. 

Note that although the quadric function is nonlinear in X, the implementation 
shown in figure 3 is linear in F. A one-to-one transformation has been made from the 
pattern space to a function space to provide the capability to use the linear- machine 
implementation of the preceding discussion. This technique is valuable in the machine- 
training phase and can be generalized to higher order, more powerful functions. Any 
discriminant function of the preceding form, with the pattern vector applied to a proc- 
essor followed by a linearly weighted summer, is a 0 function. These 0 functions 
are powerful in terms of the types of surfaces implemented and in view of the fact that 
the 0 functions are trained as linear machines. 
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X 1 

x 2 



Figure 3. - Quadric <p machine. 

NONPARAMETRIC METHODS 


In nonparametric methods, the members of different catagories generally are as- 
sumed to be separable from each other by some forms of surfaces, and attempts are 
made to identify those surfaces. Usually, some form of 0 machine, which is trained 
as a linear machine, is used. The error- correction method of training a linear dichot- 
omizer, which is described in this section, can be generalized to the training of 
n- category <p machines. 

The discriminant function of a linear dichotomizer is 


b(X,=w 1 x 1 +w 2 x 2 + -" + 'Vn + 'Vl 
= ( w l> w 2’ •••>V w n + lK x l' x 2’"-V 1 ) 

= W • Y 


( 11 ) 


where W = (w^, Wg, • • • , w n , w n+ ^) is called the weight vector and Y = (x^, Xg, . . . , x n >l) 

is called an augmented pattern vector. The set of all W forms a vector space called 
the weight space. The set of all W such that 


W • Y = 0 


(12) 


is a hyperplane through (0, 0, ... , 0) called the pattern hyperplane of Y. Consider a 
simple example in a two-dimensional weight space with three patterns in a training set. 
(Patterns are then real numbers. ) 

Let 


Y x = (1, 1),Y 2 = 


(-5, 1),Y 3 




(13) 
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then, the following three equations describe the pattern hyperplanes associated with the 
three patterns in this training set (fig. 4). 



Figure 4. - Error -correction training example. 

W • Yj = Wj + w 2 = 0 or Wj = -w 2 (14) 


W • Yg = -5Wj + w 2 = 0 or w 2 = 5w^ (15) 


W • Y 3 = \ w i + w 2 = 0 or w j = " 2w 2 (16) 


Consider the hyperplane in equation (14) (fig. 4) and any W such that w 2 > -w^. Then 

W • Yj > 0 and an arrow on the hyperplane indicates the positive side. Similarly, the 

hyperplanes in equations (15) and (16) (fig. 4) have positive sides; that is, any weight 
vector on that side will give a positive response when coupled with that pattern vector. 
Suppose that and Y g belong to category 1 and Y 2 belongs to category 2. Then, 

a proper discriminant function should be 


g(X) = W • Y > 0 for Xj,Xg = W • Y < 0 for x 2 (17) 
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Any weight vector in the crosshatched section of figure 4 satisfies these relationships. 
Such a section is called the solution region. In error- correction training, the proce- 
dure starts with an arbitrary weight vector that is adjusted until a vector in the solu- 
tion region is found. 

Start with the weight vector (-1. 5, 1) and examine the discriminant function out- 
put for Yg, which is correctly positive. No adjustment is required. For Yj, the 

response is incorrectly negative, and adjustment is required. In figure 4, the most 
advantageous direction to move W is perpendicular to equation (14). This movement 
is accomplished by taking 


W’ = W + Yj 


(18) 


which produces a weight vector that gives a correctly positive answer for Y^. By the 
use of W', Y 2 produces an incorrectly positive number. Then, take 


W” = W’ - Y 2 = (-0. 5, 2) - (-5, 1) = (4. 5, 1) 


(19) 


This weight vector is in the solution region and will give correct answers to all the 
training patterns. The machine is trained. 

Thus, the nonparametric training method involves successively testing each vec- 
tor in a training set and making corrections until the machine ceases to make errors. 
For separable sets, this procedure converges to a solution after only a finite number 
of operations. However, in the correction of one error, the weight vector may be 
moved into a region causing an error from another pattern vector. Thus, the training 
process may terminate only after many cycles through the training set. In cases in 
which the sets are not separable, this training process will never terminate, and other 
techniques, such as the nearest neighbor approach (ref . 11), must be used. The ap- 
pendix contains a computer implementation of this technique for an nth-degree <P 
'machine. 


PARAMETRIC METHODS 


Parametric methods depend on a field of mathematical statistics that is called 
decision theory (ref. 13). Decision theory involves making optimum decisions in the 
absence of deterministic conditions. For instance, two sets may be nonseparable but 
distributed in ways that allow a best choice between the two. 

The idea of loss is inherent in the concept of best choice. That is, what is lost 
because of an incorrect decision ? A loss function assigns values to the losses incurred 
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in a variety of error situations. Many such loss functions exist, but only the symmetri- 
cal loss function will be considered. 


X(i/j) = 1 - 6.. 


5 .. 

i] 


jM = j 
|0,i * j 


( 20 ) 


where X(i/j) is the loss caused by deciding on i, when actually j has occurred. Thus, 
a loss of 1 occurs for any error, and a loss of 0 occurs for a correct decision. Using 
this loss function, an optimum classifier will be derived for pattern sets distributed in 
a Gaussian manner about the means. 

The conditional average loss in classifying a vector X into category i can be 
written as 


k 

L x (i) =^X(i/j)p(j/x) (21) 

5 = 1 


By Bayes rule 


P(j/x) = 


p(x/j)p(j) 

p(x) 


( 22 ) 


yielding 


L x (i) - X(i/j)p(x/j)p(j) 

3=1 


(23) 


where L (i) is to be minimized with respect to i. Because p(x) is not a function of 
x 

i, p(x) may be deleted from the equation. 


k 

L^(i) X(i/i)p(x/i)p(i) (24) 

5=1 
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Substitution of equation (20) into equation (24) yields 


k 

l 'U) = ^2 ( x ■ 

3=1 

k 

= y; p(x/j)p(j) = p(x) - p(x/i)p(i) (25) 

3=1 

This expression is minimized with respect to i by maximizing p(x/i)p(i). 

Thus, the optimum discriminant function is 

g.(X) = p(x/i)p(i) (26) 

Because a logarithmic form is more useful for the following discussion, use is made 
of the monotonic property of the logarithm to give 

g’(X) = In g.(X) = In p(x/i) + In p(i) (27) 


For a multivariate Gaussian distribution 


p(x/i) = 


( 2 ^ 


2 . 


T72 


exp 


-3( x - M i)‘V 1 ( x - M i) 


(28) 


where X is the pattern vector in column form, M is the mean vector in column 
form, and 


a a. 

li in 


a . ... cr 

ni nn 


(29) 
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called the covariance matrix. Then 


g! (X) = - 1 In 2ir - \ In |s.| - ^ [(x - M.) 1 S." *(x - M.)] + In p. (30) 


which simplifies to 

el® = a i - i[( x - M i)‘ V ‘(x - M.)] (31) 


with 


a. = In p. -4 In 2. (32) 

l 2 l ' ' 


which is recognized as a form of the quadric function 


g.(X) = X*AX + X t B + D (33) 


Thus, quadric discriminant functions are optimum for Gaussian patterns. Obviously, 
a different loss function or distribution would produce a different result, but the 
Gaussian distribution approximates many natural phenomena and is useful. The mean 
vector and the covariance matrix are estimated from a training set in any of a variety 
of ways. 

AN ERROR-CORRECTION EXAMPLE 


The 0 machines are similar to polynomial curve-fitting programs in that with- 
out a priori knowledge of the exact order of the required surface, the machines attempt 
to fit surfaces to sets of data. An example of an nth-degree machine, which is dis- 
cussed in this section, is a computer implementation that fits an nth-degree polynomial 
between separable sets of data in two-dimensional Cartesian coordinates. That is, the 
discriminant function is 


/ \ n-2 n-3 

g(x, y) = w r x + w n _ jX + . . . + w g x + w 2 + w x y (34) 
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Several sets of data were used in an investigation into the convergence rates and proc- 
essing times to be expected with this class of problem. 

This example makes use of the error-correction method described previously. 
An attempt was made to remove the time-consuming iterative processes from the ac- 
tual training loop. For instance, in fitting a fifth-degree polynomial, the fifth, fourth, 
third, and second powers of the abscissa value are used repeatedly in the training loop. 
To save time, these powers of the abscissa are calculated and stored as new variables 
in the program before the error-correction iterations begin. Thus, only those opera- 
tions involved in the calculation of the discriminant value and the adjustment of the 
weight vector are retained as iterative processes. The actual computer program is 
shown in figure 5, and a discussion of the mechanics is in the appendix. 

Five sets of data and the machine (Univac 1108) convergence times for various 
orders of discriminant functions are shown in figures 6 to 11. First set of data (fig. 6) 
involved two straight lines separated by a relatively large space. Convergence times 
for the first- to fourth-degree curves are shown in figure 7. The second set of data 
(fig. 8) also involved two straight lines, but these lines had considerably reduced spac- 
ing. Processing times increased slightly, and the fourth-degree polynomial did not 
converge in 10 minutes of machine time. Similarly, parabolic (fig. 9), cubic (fig. 10), 
and quartic (fig. 11) sets of data were tried with relatively narrow separations. In no 
case did the fourth-degree discriminant function converge within 10 minutes. 

In figure 7, the processing time is plotted on a logarithmic scale. With this 
scale, the processing times form a basically straight line with respect to the order of 
the fitted polynomial. Because the processing time increases exponentially with order, 
an increasingly high price is paid for additional dimensions or components in the train- 
ing vector. The actual shapes of the two sets of data are relatively insignificant; that 
is, if the form factors remain approximately the same, the program will fit a third- 
degree equation to sets of cubic data almost as well as to sets of linear data. Because 
of the significant amount of processing time that is required, this particular technique 
would not be an effective tool for fitting high- order polynomials. Other techniques that 
use pattern recognition to fit curves have been developed (ref. 14). In general, these 
techniques allow some error and use elaborate gerrymandering to decrease processing 
time to a minimum. 
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Si-’ 

end 



A. 



Figure 6. - Linear (wide) curve-fit 
example . 


Figure 7. - Example convergence times. 



Figure 8. - Linear (narrow) curve-fit 
example . 
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Figure 10. - Cubic curve-fit example. Figure 11. - Quartic curve-fit example 

(did not converge). 


CONCLUDING REMARKS 

Pattern recognition is based on the idea of examining of set of measurements 
(called a pattern) of a phenomenon and deriving decisions from the characteristics of 
the pattern. Usually, one of two decisionmaking procedures is used. The nonparamet- 
ric procedure is a deterministic approach that seeks out separating boundaries or some 
other rules that allow precise decisions. The parametric machines attempt to make 
optimum decisions based on the statistical properties of patterns, always recognizing 
the probabilities of making an error. 

The components of the pattern must be selected carefully to represent, with as few 
components as possible, the phenomenon of interest. This careful selection of compo- 
nents is necessary because processing time increases as a function of the number of 
components of the pattern. In the example, processing time increases exponentially 
with the number of components. 
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Pattern recognition is an active field. Several hundred technical reports and 
journal articles on the subject are published annually. Most large engineering schools 
have established specialty areas in the field of learning machines. The general theory 
is now well established and may be adapted easily to the needs of the individual user. 


Manned Spacecraft Center 

National Aeronautics and Space Administration 
Houston, Texas, February 20, 1970 
914-50-50-16-72 
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APPENDIX 


AN nth-DEGREE 0 MACHINE 


The computer program that is shown in figure 5 uses FORTRAN V and has been 
run on the Univac 1108 computer. A block diagram is shown in figure 12. 

The following list is a description of the key data cards that are used in the oper- 
ation of this program. 

1. Initial Data Card — This card contains the key parameters that are used in 
starting the program. Because maximum flexibility was desired, the program was de- 
signed to do, with one input, several training runs with different orders of discrimi- 
nant functions. The following parameters must appear on the initial data card. 

a. NRUN (integer value in columns 1 to 10) — Controls the number of train- 
ing runs 

b. NPTS (integer value in columns 11 to 20) — Specifies the number of vec- 
tors in each training set 

c. NX (integer value in columns 21 to 30) — Specifies the number of descrim- 
inant values to be calculated after training 

d. SLIP (floating point number in columns 31 to 40) — Sets the spacing 
between the discriminant values to be calculated 

e. STRT (floating point number in columns 41 to 50) — Sets the value of the 
first point at which a discriminant value is to be calculated 

2. Training -Vector Data Cards — There are the number NPTS of these cards. 

All vectors belonging to one set are separated from those of the other set. Each set 
must contain an equal number of vectors . The vectors are punched, one from each 
set per card, with the abscissa and ordinate from set 1 in columns 1 to 10 and 11 to 20, 
respectively, and the values from set 2 in columns 21 to 30 and 31 to 40, respectively. 

3. Discriminant-Order Cards — There are the number NRUN of these cards. 

An integer value, which is the order of the discriminant function to be used during that 
run, is placed in columns 1 to 10. 

This program is a straightforward representative implementation of the error- 
correction training method. The following list is a step-by-step explanation of the 
block diagram in figure 12. 

1. After reading this Initial Data Card and the Training-Vector Data Cards, the 
program sets up the outer loop (DO 190) that counts the number of runs. The program 
then reads the order of the first discriminant function to be derived and sets the inter- 
nal timer to 0. 

2. The DO 1 loop, which maps the input training vectors into <p space, is the 
<p processor. The results of this loop are two sets of vectors in memory that have 
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NRUN - Number of different polynomial 
orders to be tried 

NPTS - Number of vectors in each training 
set 

NX - Number of points at which ordinate 
is to be calculated 

SLIP - Distance between points at which 
ordinates are to be calculated 

STRT - Initial abscissa 

KNWTS(L) - The number of components 
of the weight vector for 
the Lth polynomial tried 


Figure 12. - An nth-degree <p machine block diagram 










































NWTS number of components each, instead of two components each. After this loop, 
the machine operates linearly. 

3. All weight-vector components are set to 1. 

4. The error counter, INC, is set to 0. 

5. The DO 6 loop starts the error-correction procedure. A discriminant value 
is calculated for each training vector. A negative error causes the training vector to 
be added to the weight vector. Conversely, a positive error causes the training vector 
to be subtracted from the weight vector. If an error occurs, INC is incremented by 1. 

6. After all training vectors have cycled through DO 6, INC is tested. If INC is 
not 0, the program cycles back to 3, and the process is repeated. If INC is 0, the com- 
puter does CALL TIME, which gives the time elapsed since CALL RESET. 

7. The program prints the components of the weight vector. 

8. The program calculates and prints the desired number of discriminant values 
at the desired abscissas. 

9. The program prints the time required for training. 

10. If the DO 190 index has not been satisfied, the program recycles. Otherwise, 
the program STOPS. 


NASA-Langley, 1970 
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